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Experimentally Manipulated Bias in School Psychologists’ Scoring of WISC-III Protocols. 

A presentation to the Midwest Educational Research Association 
Division D: Measurement and Research Methodology 
Chicago, II 
October 26, 2001 

Lawrence W. Sherman (shermalw@muohio.edu i and Amy N. Taylor^ 

Department of Educational Psychology 
Miami University 
Oxford, Oh 45056 
Available on the web at: 
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Abstract. Experimenter Bias Effects were experimentally manipulated in a sample of School 
Psychologists' (n= 97) scoring of three subscales (Similarities, Vocabulary, Comprehension) of the 
WISC-III. First year (n=29), interns (n=42) and experienced (n=26) school psychologists were randomly 
assigned to either a bias or control group and requested to score the identical three subscale protocols. 
No statistically significant interactions between experimental groups (biased vs. control) and level of 
experience (first-year vs. Interns vs. experienced) were obtained. All main effects were non-significant. 
These results were interpreted as an affirmation of the objectivity of scoring for these relatively subjective 
sub-scales, as well as the quality of training of these students, interns and experienced practitioners. 

Intelligence tests are one integral part of educational planning and placement. The most widely 
used intelligence test currently on the market is the Wechsler Intelligence Scale for Children- Third 
Edition . Although great efforts have been made to make this test a standardized and objective measure, 
some subtests have been shown to be vulnerable to examiner subjectivity. Earlier research on previous 
versions of the WISC have indicated that several sources of bias can significantly influence an 
examiner's scoring of WISC-III (Sattler, 1992; Massey, 1964; Miller, 1970; Miller & Chansky, 1972; 

Sattler, Squire, & Andres, 1977; Slate & Chick, 1989; Slate & Jones, 1990; Slate, 1993; Wheeler, 1987; 
Kirchner, 1979; Shannon, 985; O'Reilly, 1989). Inasmuch as the WISC and it’s subsequent revisions, the 
WISC-R and WISC-III, is a test that is commonly used to determine a variety of special education 
classifications, it would be important to know that the latest revision of this measurement device is reliable 
and free from bias. 

Rosenthal's (1976; 1994) notion of "experimenter bias" might suggest that an examiners 
diagnosis of a client may unintentionally be influenced by bias, especially in the relatively subjective 
scoring systems associated with three specific subtests of the WISC-III. The present study focused on 
the effects of an experimentally induced disability bias, Down Syndrome. A randomly determined 
independent variable consisted of a control group not receiving this bias as contrasted with an 
experimental group that did receive the bias. Three levels of experience (first-year school psychology 
students, third-year school psychology interns, and experienced certified school psychologists) were 
considered as a moderator variable. Three dependent measures included the subjects' scoring of the 
Similarities . Vocabulary , and Comprehension subtests of the WISC-III . Based on the prior research on 
expectancy bias and errs observed in Wechsler scale scoring by school psychologists and trainees, the 
present study sought to find if a completed WISC-III protocol might also be prone to these influences. 
Thus, we hypothesized that our biased group would have significantly lower scores than our control 
group. Within the context of the many WISC sub-tests (1 3 in all), the three subtests that we used are 
often referred to as the most subjective and vulnerable to external bias. We also hypothesized that the 
differences between the biased and control groups would be least for the experienced school 
psychologists, followed by the interns and then greatest for the novice school psychology trainees. 



^This study reports findings that were originally Amy N. Taylor’s Specialist Degree Thesis at Miami 
University, Oxford, Ohio, 2001 . We are thankful for Dr. Alex Thomas’ support and assistance in completing this 
project. He greatly facilitated many of the subjects who volunteered to participate in this study. 
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Method 



Participants: 

One adolescent with Down Syndrome was selected with permission from his primary caregiver to 
complete a WISC-III protocol. No identifying information on this individual was known to anyone but the 
researchers. This individual’s anonymity was protected throughout the study. Volunteers, including 29 
first year school psychology graduate students, 42 intern school psychology graduate students, and 26 
certified school psychologists practicing in the field were randomly assigned to either the bias or control 
conditions. The first year students and the interns were from various training programs in the state of 
Ohio including Miami University, Kent State, and the University of Akron. The certified school 
psychologists were randomly selected from the South Western Ohio School Psychology Association 
database. 

Materials: 



The WISC-III was given to the adolescent with Down Syndrome. The answers he gave to the 
subtests of Similarities, Vocabulary and Comprehension were transcribed on to a blank protocol and 
coded according to what group of subjects would receive the protocol (e.g. bias vs. control and level of 
training). This was done so that the researcher could identify which group each protocol belonged to 
without identifying any subjects, who for the most part remained anonymous. The protocols were coded 
by changing the response on one answer on the Similarities subtest slightly according to which group 
they belonged to. Half of the transcriptions to each group included a sheet of paper indicating that the 
individual who completed this protocol had Down Syndrome while the other half received no information 
with the protocol. These letters told the subjects nothing about the study’s purpose. They were only told 
to score the subtests as best they could without using the WISC-III manual or any other aid. The Subjects 
were told not to use the manual to prevent sharing of the manual by the subjects, which could lead to 
collaboration on responses, and thereby tainting the data collected. 

Procedure: 



One of the researchers first gave the WISC-III subtests of Similarities . Vocabulary and 
Comprehension to the adolescent with Down syndrome. When the protocol was completed, the 
researcher then transcribed the answers to the questions on the Similarities , Vocabulary , and 
Comprehension subtests onto six different blank protocols. These six groups were coded according to 
which group they belonged, and copies of this transcription were then made. A coding system was 
developed and placed on the protocol to indicate level of training of the subjects and whether or not they 
received the bias to facilitate a “double-blind" element of the study. Half of the subjects from each level of 
training randomly received the bias and the other half did not. The bias consisted of a small sheet of 
paper stating that an individual with Down Syndrome completed the three subtests they received. All 
subjects also received instructions with their transcriptions asking them to score the subtests without 
using any scoring guide and to not share their answers or protocols with anyone else to insure 
confidentiality. They were also instructed to mail their completed protocols back to the researchers in the 
enclosed self-addressed envelopes via the Educational Psychology office at Miami University. No 
identifying information was placed on any protocol to identify the subjects who remained anonymous. 

The researcher was only aware of whether the subject was a first year graduate student, an intern or a 
certified school psychologist and whether or not they received or did not receive the bias. Thus, the 
researchers were “blind" as to who received what experimental condition. The researchers then 
processed each subjects’ raw scores for each subtest and used them as dependent measures in later 
analyses. 

Research Design: 



This study was a randomized posttest-only control group design (see Table 1). The subjects in 
the three different groups (student, intern, certified - the moderating variable) were randomly assigned to 
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receive the bias or not receive the bias - the independent variable. The posttest consisted of determining 
if there was a significant difference in raw scores (the dependent variable) among the groups receiving 
the bias or not receiving the bias - the independent variable. Further, the researchers sought to 
determine if these score differences varied depending on the three experience levels of the group (the 
moderating variable). We hypothesized that the mean scores for the biased groups would be 
significantly lower on each of the three subtests than the control groups’ mean scores. Further, the 
difference between scores in the biased versus control conditions would be greatest for the first year 
students, then the interns and least for the certified school psychologists (see Table 2). 

Table 1. Symbolic Representation of Research Design. 





Similarities 


Vocabulary 


Comprehension 


Group I: 


Rn-X-Oi 


Rn-X-Oi 


Rn-X— Oi 


Students 


Rn — C — O2 


0 

1 

0 

I 

c 

cr 


Rn C — O2 


Group II: 


Ri— X— O3 


Ri— X— O3 


Ri— X— O3 


Interns 










Rj — C — O4 


Rj — C — O4 


Rj — C — O4 


Group III: 


Rc— X— O5 


Rc— X— O5 


Rc— X— O5 


Certified 


Rc — C — Oe 


R c — C — 0 6 


Rc — C — Og 



Rn= Randomly selected first year graduate students in School Psychology (stu) 
Ri= Randomly selected interns (third year of study) in School Psychology (int) 
Re= Randomly selected certified School Psychologists (cer.) 

X= Bias given (treatment) 

C= Control (no bias given 
0= Scores 
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Table 2 ; Research Hypotheses I and II 



Hypothesis I 


Hypothesis II 


O'! < O2 


[02-0i]>[04-03]>[06-05] 


O3 < O4 




O5 < Oe 





0i= Scores for experimental group I 
02 = Scores for control group I 
03 = Scores for experimentally group II 
04 = Scores for control group II 
Os= Scores for experimental group III 
06= Scores for control group III 



Results. 

The experimental treatment, bias vs. control group, did not significantly interact with the level of 
experience on any of the three sub-tests. These results are reported in Tables 3 and 4. No significant 
main effects were obtained on the Similarities or Vocabulary sub-tests. The Comprehension sub-test did 
obtain one marginally significant main effect for level of experience (£( 2 , 91 ) = 3.24, p < .02), with first year 
students scoring their protocols significantly (Scheffe = 3.14) higher than the experienced practitioners 
who scored their protocols the lowest, and the third-year school psychology interns contributed scores in 
the middle, not significantly different from the first year students or experienced practitioners . 




6 



4 



Sherman/Taylor: Experimenter Bias 






CO 



(/) 

(D 

O) 

c 

(U 

cr 

"D 

c 

(U 

c/T 

c 

o 



> 

(D 

"D 



(U 

■o 

C 

iS 

55 

c/f 

c 

(U 

(D 



(0 

O 

(/> 

3 

CO 

o 

n 



c 

g 

*(/> 

c 

(D 

x: 

0) 

L_ 

Q. 

E 

o 

O 



J5 

3 

(U 

o 

o 

> 



TO 

E 

O) 



CD 


O CN 


T— 


CD 


O 


CM 

II 


CM 


TT 


ih CO 


CO 


C 


T— 


c 


T— 



CD 



II 

c 



(N 



O 

a> 



og 

II 



CD 

csi 





CO 


CM 




00 


(/) 




CD CM 


CM 


CO 


03 


11 


^ CI> 


II 


CM 


bo 


c 


T— 


c 





o 



CM 

II 



^ 

^ in CO 

II CO CO 
c T- 



(/) 

ro 


CO 


o> 

CO 


CM 

II 


^ in 


CM 


o 

o 


03 

m 


II 


CO 


11 


? c\i 


II 


c\i 


CM 


bo 


c 


X— 


c 


C 


X — 





CD 

X — 


in CO 
in 


CM 

II 








CD 

CD 


11 


in CO 


CD • 


II 




X — 


c 


X— 


c 


X— 


C 


X— 





(/) 

ro 


CO 


CD 


CM 

li 


oi 


CM 

X — 


1^ 
X — 


CO 


II 




II 


^ csi 


11 


CD 


CM 


bo 


c 




c 


C 


X— 





in CO 
• o 



CM CM 

in 

II ^ • 

c T- ^ 



CD 

CM 



CD 

eg 

v' 

Q. 



(/} 

c 



CO 



(/} 

c 



CO 



(/) 

C 

<D 


_ c 

<y> 




c 

(/) 

c 


c 

03 










T3 


CM 


Q 


i_ 


0) Q 


•E ^ 


Q 


o 


1X1 


3 

00 


II ^ 
c 


CO 


0 

c 


^ CO 


o 


CO 


z 

< 


h- 

z 



o 

ERIC 



Sherman/Taylor: Experimenter Bias 



Table 4 : Three 2x3 ANOVAs* of Experimental Treatment (Bias/Unbiased) by Experience (3) for Three 
WISC-III Subtests 



Subscale 


Source 


df 


MS 


F 


P 


Similarities 














Bias/Unbiased 


1 


.61 


.21 


.64 




Experience Level 


2 


5.27 


1.84 


.64 




Interaction 


2 


1.00 


.34 


.70 




Error 


91 


2.87 






Vocabulary 














Bias/Unbiased 


1 


5.63 


.62 


.42 




Experience Level 


2 


17.76 


1.98 


.14 




Interaction 


2 


5.43 


.61 


.55 




Error 


91 


8.95 






Comprehension 














Bias/Unbiased 


1 


1.65 


.17 


.67 




Experience Level 


2 


33.96 


3.66 


.02 




Interaction 


2 


12.74 


1.37 


.26 




Error 


91 


9.28 







‘Statistics computed using GB-STAT (Friedman, a1998). 

Discussion 

No significant differences in scoring were found for the experimental versus the control groups on 
the Similarities, Vocabulary or Comprehension subtests. Bias alone showed no effect on any of the 
three subtests. However, Level of experience did show a marginally significant effect on the 
Comprehension subtest. This was not a hypothesis that was originally to be tested by the researchers, 
but an interesting finding nonetheless, even if somewhat marginal in significance.. However, this finding 
did not take the direction the researchers would have expected. On the Comprehension subtest, the 
certified School Psychologists produced significantly lower means than the first year students. This 
finding demonstrates that level of experience may be related to differential scoring on this subtest. It 
does not support the researchers’ hypothesis that the more experienced an individual is, the less likely 
they will be influenced by a bias. This may be attributed to experienced school psychologists being more 
stringent in their scoring or to novice school psychology trainees being too liberal. Technically, if one is to 
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assume that no matter what level of experience a professional is at, they still will have mastered the 
scoring techniques, then experience alone should not have had a significant effect. This finding could 
speak to the test-makers about searching out ways to make the scoring of this subtest less subjective and 
therefore examiners more capable of arriving at a uniform score. 

This study was limited in several ways. The small number of subjects and the limitation in 
geographical representation subsequently makes the obtained results less generalizable. Further 
research should concentrate on broadening both the size of the groups as well as the geographic and 
demographic diversity of the subjects involved. The final and perhaps most obtrusive limitation of this 
study was the overall contrived nature of the research. Having school psychologists and school 
psychologists in training score protocols from a child they have never seen, much less assessed, is very 
unrealistic. In the real world of practice, the child would be in front of the examiner and a much “truer" 
score would likely be determined. On the other hand, having a child with Down Syndrome in front of the 
examiner may even heighten the effect of the bias given the possible influence of observed physical 
attributes of the child being assessed. One can never really know what the “true" effects would be. 

While this study failed to confirm hypotheses based on Rosenthal's (1994) experimenter bias 
effect, the results are interpreted as an affirmation of the objectivity of scoring for these relatively 
subjective sub-scales, as well as the quality of training of these students, interns and experienced 
practitioners. 
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